Robust, real-time endpoint detector with energy normalization for ASR in adverse environments
نویسندگان
چکیده
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, accurate, and robust endpointing algorithm for real-time ASR. We propose a novel approach of using a special filter plus a 3-state decision logic for endpoint detection. The filter has been designed under several criteria to ensure the accuracy and robustness of detection. The detected endpoints are then applied to energy normalization simultaneously. Evaluation results show that the proposed algorithm significantly reduce the string error rates on 7 out of 12 tested databases. The reduction rates even exceeded 50% on two of them. The algorithm only uses one-dimensional energy with 24-frame lookahead; therefore, it has a low complexity and is suitable for real-time ASR.
منابع مشابه
A Robust, Real-time Endpoint Detector with Energy Normalization for Asr in Adverse Environments
When automatic speech recognition (ASR) is applied to hands-free or other adverse acoustic environments, endpoint detection and energy normalization can be crucial to the entire system. In low signal-to-noise (SNR) situations,conventional approaches of endpointing and energy normalization often fail and ASR performances usually degrade dramatically. The goal of this paper is to find a fast, acc...
متن کاملRobust endpoint detection and energy normalization for real-time speech and speaker recognition
When automatic speech recognition (ASR) and speaker verification (SV) are applied in adverse acoustic environments, endpoint detection and energy normalization can be crucial to the functioning of both systems. In low signal-to-noise ratio (SNR) and nonstationary environments, conventional approaches to endpoint detection and energy normalization often fail and ASR performances usually degrade ...
متن کاملMulti-View Face Detection in Open Environments using Gabor Features and Neural Networks
Multi-view face detection in open environments is a challenging task, due to the wide variations in illumination, face appearances and occlusion. In this paper, a robust method for multi-view face detection in open environments, using a combination of Gabor features and neural networks, is presented. Firstly, the effect of changing the Gabor filter parameters (orientation, frequency, standard d...
متن کاملRobust Iris Recognition in Unconstrained Environments
A biometric system provides automatic identification of an individual based on a unique feature or characteristic possessed by him/her. Iris recognition (IR) is known to be the most reliable and accurate biometric identification system. The iris recognition system (IRS) consists of an automatic segmentation mechanism which is based on the Hough transform (HT). This paper presents a robust IRS i...
متن کاملImproving the performance of MFCC for Persian robust speech recognition
The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...
متن کامل